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Abstract: Diffusion limits of MCMC methods in high dimensions provide a useful theoretical tool 
for studying computational complexity. In particular they lead directly to precise estimates of the 
number of steps required to explore the target measure, in stationarity, as a function of the dimension 
of the state space. However, to date such results have mainly been proved for target measures with a 
product structure, severely limiting their applicability. The purpose of this paper is to study diffusion 
limits for a class of naturally occuring high dimensional measures, found from the approximation of 
measures on a Hilbert space which are absolutely continuous with respect to a Gaussian reference 
measure. The diffusion limit of a random walk Metropolis algorithm to an infinite dimensional Hilbert 
space valued SDE (or SPDE) is proved, facilitating understanding of the computational complexity of 
the algorithm. 

1. Introduction 

Metropolis-Hastings methods [MRTT53, Has70] form a widely used class of MCMC methods [Liu08, RC04] 
for sampling from complex probability distributions. It is therefore of considerable interest to develop math- 
ematical analyses which explain the structure inherent in these algorithms, especially structure which is 
pertinent to understanding the computational complexity of the algorithm. Quantifying computational com- 
plexity of an MCMC method is most naturally undertaken by studying the behavior of the method on a 
family of probability distributions indexed by a parameter, and studying the cost of the algorithm as a 
function of that parameter. In this paper we will study the cost as a function of dimension for algorithms 
applied to a family of probability distributions found from finite dimensional approximation of a measure on 
an infinite dimensional space. 

Our interest is focused on Metropolis-Hastings MCMC methods [RC04]. We study the simplest of these, 
the random walk Metropolis algorithm (RWM). Let 7r be a target distribution on M. N . To sample from 7r, 
the RWM algorithm creates a 7r- reversible Markov chain {x n }^ =0 which moves from a current state x° to 
a new state x 1 via proposing a candidate y, using a symmetric Markov transition kernel such as a random 
walk, and accepting y with probability a(x° , y), where a(x, y) = 1 A ^A. Although the proposal is somewhat 
naive, within the class of all Metropolis-Hastings algorithms, the RWM is still used in many applications 
because of its simplicity. The only computational cost involved in calculating the acceptance probabilities is 
the relative ratio of densities ^r\, as compared to, say, the Langevin algorithm (MALA) where one needs 
to evaluate the gradient of log n. 

A pioneering paper in the analysis of complexity for MCMC methods in high dimensions is [RGG97] . This 
paper studied the behavior of random walk Metropolis methods when applied to target distributions with 
density 

n N (x)=nlj( Xi ) (1.1) 
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where f(x) is a one-dimensional probability density function. The authors considered a proposal of the form 



y = x + VSp 
pSN(0,Ijv) 

and the objective was to study the complexity of the algorithm as a function of the dimension N of the state 
space. It was shown that choosing the proposal variance S to scale as 8 = 2l 2 \ 2 N~ 1 with 1 A~ 2 = J (j) 2 fdx 
(I > is a parameter which we will discuss later) leads to an average acceptance probability of order 1 with 
respect to dimension N. Furthermore, with this choice of scaling, individual components of the resulting 
Markov chain converge to the solution of a stochastic differential equation (SDE). To state this, we define a 
continuous interpolant : 

z N (t) = (Nt - k)x k+1 + (k + 1 -Nt)x k , k<Nt<k + l. (1.2) 

Then [RGG97] shows that, when the Markov chain is started in stationarity, z N => z as N — > oo in C([0, T]; M) 
where z solves the SDE 2 

J = \ 2 h(e){\ogf(z)}> + V2*W)^r (1-3) 
h(£)=2f<S>(~^=). (1.4) 

Here $ denotes the CDF of a standard Normal distribution, "=>■" denotes weak convergence and C([0, T], R) 
denotes the Banach space of real valued continuous functions defined on the interval [0, T] endowed with the 
usual suprcmum norm. Note that the invariant measure of the SDE (1.3) has the density / with respect to 
the Lebesgue measure. This weak convergence result leads to the interpretation that, started in stationarity 
and applied to target measures of the form (1.1), the RWM algorithm will take on the order of N steps 
to explore the invariant measure. Furthermore it may be shown that the value of I which maximizes h(£) 
and therefore maximizes the speed of convergence of the limiting diffusion, leads to a universal acceptance 
probability, for random walk Metropolis algorithms applied to targets (1.1), of approximately 0.234. 

These ideas have been generalized to other proposals, such as the MALA algorithm in [RR98]. For Langevin 
proposals the scaling of S which achieves order 1 acceptance probabilities is 5 oc iV~a and the choice of the 
constant of proportionality which maximizes the speed of the limiting SDE results from an acceptance 
probability of approximately 0.574. Note, in particular, that this method will take on the order of Ns steps 
to explore the invariant distribution. This quantifies the advantage of using information about the gradient 
of log 7r in the proposal: RWM algorithms, which do not use this information, take on the order of N steps. 

The work by Roberts and co-workers was amongst the first to develop a mathematical theory of Metropolis- 
Hastings methods in high dimension, and does so in a fashion which leads to clear criteria which practitioners 
can use to optimize algorithmic performance, for instance by tuning the acceptance probabilities to 0.234 
(RWM) or 0.574 (MALA). Yet it is open to the criticism that, from a practitioner's perspective, target 
measures of the form (1.1) are too limited a class of probability distributions to be useful and, in any case, 
can be tackled by sampling a single one-dimensional target because of the product structure. There have 
been papers which generalize this work to target measures which retain the product structure inherent in 
(1.1), but arc no longer i.i.d.(see [Bed07, RR01]): 

n»(x)=nl 1 \~ 1 f(\l 1 x l ). (1.5) 

However, the same criticism may be applied to this scenario as well. 

Despite the apparent simplicity of target measures of the form (1.1) and (1.5) the intuition obtained from 
the study of Metropolis- Hastings methods applied to these models with product structure is in fact extremely 
valuable. The two key results which need to be transferred to a more general non-product measure setting are: 



If / is the p.d.f. of a Gaussian on R then A is its standard deviation. 
2 Our /i(-) and £ are different from the h a i^ and £ a i^ used in [RGG97]. However they can be recovered from the identities 
(■lid = 2X2£2 ' holddold) = -2\ 2 h(£). 



(i) the scaling of the proposal variance with TV in order to ensure order one acceptance probabilities; (ii) the 
derivation of diffusion limits for the RWM algorithm with a time-scale factor which can be maximized over all 
acceptance probabilities. There is some work concerning scaling limits for MCMC methods applied to target 
measures which are not of product form: the paper [Bed09] studies hierarchical target distributions; the paper 
[BPS04] studies target measures which arise in nonlinear regression, and have a mean field structure, and the 
paper [BROO] studies target densities which are Gibbs measures. We add further to this literature on scaling 
limits for measures with non-product form by adopting the framework studied in [BRSV08, BRS09, BS08]. 
There the authors consider a target distribution tt which lies in an infinite dimensional, real separable Hilbcrt 
space which is absolutely continuous with respect to a Gaussian measure ttq with mean zero and covariancc 
operator C (see Section 2.1 for details). The Radon-Nikodym derivative has the form: 

dir 

- — = M* exp(-^(x)) (1.6) 
dn 

for a real valued tto— measurable functional VP on the Hilbert space and My a normalizing constant. In 
Section 3.1 we will specify, and discuss, the precise assumptions on '3/ which we adopt in this paper. This 
infinite dimensional framework for the target measures, besides being able to capture a huge number of 
useful models arising in practice [HSV10, StulO], also has an inherent mathematical structure which makes 
it amenable to the derivation of diffusion limits in infinite dimensions, whilst retaining links to the product 
structure that has been widely studied. We highlight two aspects of this mathematical structure. 

Firstly, the theory of Gaussian measures naturally generalizes from to infinite dimensional Hilbcrt 
spaces. Let (-, •), || ■ ||) denote a real separable Hilbert space with full measure under fj.^ (^ will be densely 
defined on T-L). The covariance operator C : H H> H is a sclfadjoint, positive, and trace class operator on T-L 
with a complete orthonormal eigenbasis {Xj, 4>j}'- 

C(j>j = X](j) 3 . 

Henceforth we assume that the eigenvalues are arranged in decreasing order and Xj > 0. Any function x G % 
can be represented in the orthonormal eigenbasis of C via the expansion 

oo 

Throughout this paper we will often identify the function x with its coordinates {xj}'^L 1 <E £ 2 in this 
eigenbasis, moving freely between the two representations. Note, in particular, that C is diagonal with 
respect to the coordinates in this eigenbasis. By the Karhunen-Loeve [DPZ92] expansion, a realization x 
from the Gaussian measure ttq can be expressed by allowing the Xj to be independent random variables 
distributed as Xj ~ TV(0, A|). Thus, in the coordinates {xj}, the prior has the product structure (1.5). For 
the random walk algorithm studied in this paper we assume that the eigenpairs {Xj, cj)j} are known so that 
sampling from ttq is straightforward. 

The measure it is absolutely continuous with respect to ttq and hence any almost sure property under ttq 
is also true under w. For example, it is a consequence of the law of large numbers that, almost surely with 
respect to ttq, 

, n 2 

-V^^l as TV^oo. (1.8) 
TV ^ A? 

This also holds almost surely with respect to n, implying that a typical draw from the target measure ir 
must behave like a typical draw from 7r in the large j coordinates. 3 This offers hope that ideas from the 
product case are applicable to measures tt given by (1.6) as well. However the presence of "5 prevents use 
of the techniques from previous work on this problem: the fact that individual components of the Markov 
chain converge to a scalar SDE, as proved in [RGG97], is a direct consequence of the product structure 



3 For example, if fiQ is the Gaussian measure associated with Brownian motion on a finite interval, then (1.8) is an expression 
for the variance scale in the quadratic variation, and this is preserved under changes of measure such as the Girsanov formula. 
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inherent in (1.1) or (1.5). For target measures of the form (1.6) this structure is not present, and individual 
components of the Markov chain cannot be expected to converge to a scalar SDE. However it is natural to 
expect convergence of the entire Markov chain to an infinite dimensional continuous time stochastic process 
and the purpose of this paper is to carry out such a program. 

Thus the second fact which makes the target measure (1.6) attractive from the point of view of establishing 
diffusion limits is that fact that, as proved in a series of recent papers, [HAVW05, HSV07], it is invariant for 
Hilbcrt-space valued SDEs (or stochastic PDES - SPDEs) with the form 

^ = -h(£)(z + CVn^)+V^W)^, z(0) = z a (1.9) 

where W is a Brownian motion (see [DPZ92]) in H with covariance operator C. Thus the above result from 
SPDE theory gives us a natural candidate for the infinite dimensional limit of an MCMC method. We will 
prove such a limit for a RWM algorithm with proposal covariance ^fC. Moreover we will show that the 
time constant h(l) is maximized for an average acceptance probability of 0.234, as obtained in [RGG97] in 
the product case. 

These measures n given by (1.6) have a number of features which will enable us to develop the ideas 
of diffusion limits for MCMC methods as originally introduced in the i.i.d. product case. Carrying out this 
program is worthwhile because measures of the form given by (1.6) arise naturally in a range of applications. 
In particular they arise in the context of nonparametric regression in Bayesian statistics where the parameter 
space is an infinite dimensional function space. The measure ttq is the prior and VP the log likelihood function. 
Such Bayesian inverse problems are overviewed in [StulO]. Another class of problems leading to measures of 
the form (1.6) are conditioned diffusions: see [HSV10]. 

To sample from it numerically we need a finite dimensional target measure. To this end let ty N (■) = ty(P N ■) 
where P N denotes projection 4 (in %) onto the first TV eigenfunctions of C. Then consider the target measure 
71"^ with the form 

-(x) oc exp(-tf "(a:)) . (1.10) 



dir 

This measure can be factored as the product of two independent measures: it coincides with ttq on H\P N H 
and has a density with respect to Lebesgue measure on P N T-L, in the coordinates {xj}^ =l . In computational 
practice we implement a random walk method on R N in the coordinate system {xj}jL 1; enabling us to 
sample from ir N in P N %. However, in order to facilitate a clean analysis, it is beneficial to write this finite 
dimensional random walk method in %. noting that the coordinates {xj}°Z N+1 in the representation of 
functions sampled from ir N do not then change. We consider proposal distributions for the RWM which 
exploit the covariance structure of ttq and can be expressed in H as: 

N 

where £ = X)&^ with £j ~ N(0, 1) i.i.d. (1.11) 

3=1 

Note that our proposal variance scales as A -7 with 7 = 1. The choice of 7 in the proposal variance affects 
the scale of the proposal moves and identifying the optimal choice for 7 is a delicate exercise. The larger 7 is 
the more 'localized' the proposed move is, and therefore for the algorithm to explore the state space rapidly, 
7 needs to be as small as possible. However, if we take 7 arbitrarily small, then the acceptance probability 
decreases to zero very rapidly as a function of N. In fact it was shown in [BRS09, BS08, BRSV08] that, 
for a variety of Metropolis-Hastings proposals, there is 7 C > such that choice of 7 < 7 C leads to average 
acceptance probabilities which are smaller than any inverse power of N. Thus in higher dimensions, smaller 
values of 7 lead to very poor mixing because of the negligible acceptance probability. However, it turns out 
that at the critical value j c , the acceptance probability is 0{l) as a function of N. In [BRS09, BS08], the 
value of 7 C was identified to be 1 and 1/3 for the RWM and MALA respectively. Finally, when using the 
scalings leading to 0(1) acceptance probabilities, it was also shown that the mean square distance moved is 
maximized by choosing the acceptance probabilities to be 0.234 or 0.574 as in the i.i.d. product case (1.1). 



4 Actually \E' is only densely defined on H but the projection P N can also be defined on this dense subset 
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Guided by this intuition, we have chosen 7 = 7 C = 1 for our RWM proposal variance which, as we will prove 
below, leads to 0(1) acceptance probabilities. 

Summarizing the discussion so far, our goal is to obtain an invariance principle for the RWM Markov 
chain with proposal (1.11) when applied to target measures of the form (1.6). The diffusion limit will be 
obtained in stationarity and will be given by the SPDE (1.9). We show that the continuous time interpolant 
z N of the Markov chain {x k } defined by (1.2) converges to z solving (1.9). This will show that, in stationarity 
and properly scaled to achieve 0(1) acceptance probabilities, the random walk Metropolis algorithm takes 
O(N) steps to explore the target distribution. From a practical point of view the take home message of this 
work is that standard RWM algorithms applied to approximations of target measures with the form (1.6) 
can be tuned to behave optimally by adjusting the acceptance probability to be approximately 0.234 in the 
case where the proposal covariance is proportional to the covariance C in the reference measure. This will 
lead to O(N) steps to explore the target measure in stationarity. This extends the work in [RGG97] and 
shows that the ideas developed there apply to non-trivial high dimensional targets arising in applications. 
Although we only analyze the RWM proposal (1.11), we believe that our techniques can be applied to a larger 
class of Metropolis- Hastings methods, including the MALA algorithm, and/or RWM methods with isotropic 
proposal variance. In this latter case we expect to get a different (non-preconditioned) tt~ invariant SPDE 
as the limit when the dimension goes to infinity (see [HAVW05, HSV07] for analysis of these SPDEs) and 
a different (more severe) restriction on the scaling of the proposal variance with N; however we conjecture 
that the optimal acceptance probability would not be changed. The proposal that we study in this paper 
relies on knowledge of the eigenstructure of the covariance operator of the prior or reference measures ttq . In 
some applications, this may be a reasonable assumption, for example for conditioned diffusions or for PDE 
inverse problems in simple geometries. For others it may not, and then the isotropic proposal covariance is 
more natural. 

We analyze the RWM algorithm started at stationarity, and thus do not attempt to answer the question 
of 'burn- in time': the number of steps required to reach stationarity and how the proposal scaling affects the 
rate of convergence. These are important questions which we hope to answer in a future paper. Furthermore 
practitioners wishing to sample from probability measures on function space with the form (1.6) should be 
aware that for some examples, new generalizations of random walk Metropolis algorithms, defined on function 
space, can be more efficient than the standard random walk methods analyzed in this paper [BRSV08, BS08]; 
whether or not they arc more efficient depends on a trade-off between number of steps to explore the measure 
(which is lower for the new generalized methods) and cost per step (which can be higher, but may not be). 

There exist several methods in the literature to prove invariance principles. For instance, because of the 
reversibility of the RWM Markov chain, utilizing the abstract but powerful theory of Dirichlet forms [MR92] 
is appealing. Another alternative is to show the convergence of generators of the associated Markov processes 
[EK86] as used in [RGG97]. However, we chose a more 'hands on' approach using simple probabilistic tools, 
thus gaining more intuition about the RWM algorithm in higher dimensions. We show that with the correct 
choice of scaling, the one step transition for the RWM Markov chain behaves nearly like an Euler scheme 
applied to (1.9). Since the noise enters (1.9) additively, the induced ltd map which takes Wiener trajectories 
into solutions is continuous in the supremum-in-time topology. This fact, which would not be true if (1.9) 
had multiplicative noise, allows to employ an argument simpler than the more general techniques often 
used (see [EK86]). We first show that the martingale increments converge weakly to a Hilbcrt space- valued 
Wiener process using a martingale central limit theorem [Bcr86] . Since weak convergence is preserved under 
a continuous map, the fact that the ltd map is continuous implies the RWM Markov chain converges to 
the SPDE (1.9). Finally we emphasize that diffusion limits for the RWM proposal are necessarily of weak 
convergence type. However strong convergence results are available for the MALA algorithm, in fixed finite 
dimension; see [BRVE09]. 

1.1. Organization of the Paper 

We start by setting up the notation that is used for the remainder of the paper in Section 2. We then 
investigate the mathematical structure of the RWM algorithm when applied to target measures of the 
form (1.10). Before presenting details, a heuristic but detailed outline of the proof strategy is given for 
communicating the main ideas. In Section 3 we state our assumptions and give the proof of the main 
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theorem at a high level, postponing proofs of some technical estimates. In Section 4 we prove the invariance 
principle for the noise process. Section 5 contains the proof of the drift and diffusion estimates. All universal 
constants, unless otherwise stated, are denoted by the letter M whose precise value might vary from one line 
to the next. 



2. Diffusion Limits of the RWM Algorithm 

In this section wc state the main theorem, set it in context, and explain the proof technique. We first 
introduce an approximation of the measure 7T, namely ir N . which is finite dimensional. We then state the 
main theorem concerning a diffusion limit of the algorithm and sketch the ideas of the proof so that technical 
details in later sections can be readily digested. 



2. 1 . Preliminaries 

Recall that H is a separable Hilbcrt space of real valued functions with inner-product and norm (■, •) and 
|| • || . Let C be a positive, trace class operator on H. Let {cj)j, A 2 } be the eigenfunctions and eigenvalues of C 
respectively, so that 

Ccj> j =\*4> j , JGN. 

We assume a normalization under which {4>j} forms a complete orthonormal basis in T-L. We also assume 
that the eigenvalues are arranged in decreasing order. For every x G % we have the representation (1.7). 
Using this expansion, we define the Sobolev spaces "H r , r € R, with the inner-products and norms defined by 

oo oo 

.<••;/, V/r Nir=£j 2r 4 (2-1) 

Notice that U Q = U. Furthermore U r C U C U^ r for any r > 0. For r G R, let B r : U h-> H denote the 
operator which is diagonal in the basis {4>j} with diagonal entries j 2r , i.e., 

B r 4>j = i 2r 4>j 

i 

so that 4>j — j r >Pj. The operator B r lets us alternate between the Hilbert space T~L and the Sobolev spaces 
W via the identities: 

{x,y) r = {B?x,B?y), \\x\\ 2 r = \\B?xf ■ (2.2) 
Let (g> denote the outer product operator in H defined by 

(x ® y)z = (y, z) x, Vx, y,z £%. (2-3) 

For an operator L : H r i— > W , we denote the operator norm on TL by || • j^^r^i) defined by 

\\ L \\c(H<-,n') = sup \\Lx\\i. 



For self-adjoint L and r = I = this is, of course, the spectral radius of L. For a positive, self-adjoint operator 
D : % i-> ~H, define its trace: 

oo 

tracep) d = f £(^-,£)^). 
i=i 

Since trace(Z?) does not depend on the orthonormal basis, an operator D is said to be trace class if trace(-D) < 
oo for some, and hence any, orthonormal basis {4>j}- 



ti 



Let 7Tq denote a mean zero Gaussian measure on TL with covariance operator C. i.e., ttq = N(0, C). If 



v 



x ~ 7ro, then the Xj in (1.7) are independent N(0, Xj) Gaussians and we may write (Karhunen-Loeve) 

OO 

ac = Xj pj fa, with pj S N(0, 1) i.i.d. (2.4) 

3=1 

Since 1 1 0fe 1 1 = \\4>k\\ = 1, we deduce that {B^ 1 fa} form an orthonormal basis for 'H r and therefore 
we may write (2.4) as 

oo 

x = ^T A j f Pi B r 1/2< Pj, with Pj ~ N(0, 1) i.i.d. (2.5) 

3=1 

If fi denotes the probability space for sequences {pj}j>i then the sum converges in L 2 (VL;T-L r ) as long as 
y^j^-i j 2r < oo. Thus, under this condition, the distribution induced by ttq may be viewed as that of a 
centered Gaussian measure on W with covariance operator C r given by 

C r = B l r ' 2 CBl' 2 . (2.6) 

The assumption on summability is the usual trace-class condition for Gaussian measures on a Hilbert space: 
trace(C r ) < oo. In what follows, we freely alternate between the Gaussian measures N(0, C) on T-L and 
N(0, C r ) on TV , for values of r for which the trace-class property of C r holds. 
Our goal is to sample from a measure 7r on T-L given by (1.6) : 

^- = M* exp(-*(s)) 

with 7To as constructed above. Frequently in applications, the functional ^ may not be defined on all of 
T-L, but only on a subset Ti r C Ti for some exponent r > 0. For instance, if Ti = 1]), the functional 

might only act on continuous functions, in which case it is natural to define on some Sobolev space 
Ti r [0, 1] for r > \. Even though the Gaussian measure tto is defined on T-L, depending on the decay of the 
eigenvalues of C , there exists an entire range of values r such that trace(CV) < 00 so that the measure ttq 
has full support on TV , i.e., k^TV) = 1. From now onwards we fix a distinguished exponent s > and 
assume that \& : TV h-> R and that the prior is chosen so that trace(C s ) < 00. Then ttq ~ N(0,C) on T-L 
and Tr(Ti s ) = 1; in addition we may view itq as a Gaussian measure N(0, C s ) on TV . The precise connection 
between the exponent s and the eigenvalues of C is given in Section 3.1. 

In order to sample from tt we first approximate it by a finite dimensional measure. Recall that 

fa = B^ 2 fa (2-7) 

form an orthonormal basis for TV . For N £ N, let P N : "H s n- X w C TV be the projection operator in TV 
onto X N = f span{0i, fa, • • • , fav}, i-£-, 

N 

P N x = ^^Xjtfij where Xj — (x, fa) s , % G "H s • 
3=1 

This shows that X N is isomorphic to M. N . Next, we approximate $ by "J^ : X w M- ffi and attempt to sample 
from the following approximation to 7r, namely 

(1) = M*jv exp(-tf "(a:)) where "(i) = V(P N x). 

Note that W N (x) = P N W(P N x) and 9 2 * w (x) = P N d 2 ^(P N x)P N . The constant is chosen so that 

tt n (H s ) = 1. It may be shown that, for large 2V, the measure ir N is close to the measure ir in the Hellinger 
metric (see [CDS]). Set 

C N ^P N CP N , = B l J 2 C N B X J 2 . (2.8) 



Notice that on X N , tt n has Lebesgue density 5 

n N (x) = M^ N cxp ( - * N (x) - \(P N x, C~ 1 (P N x)^, xeX N 

= M^ N cxp ( - 9 N (x) - ±{x, (C^)- 1 ^) (2.9) 

since C N is invertible on X N because the eigenvalues are assumed to be strictly positive. On H 3 \X N we 
have that ir N = ttq. Later we will impose natural assumptions on \& (and hence on ty N ) which are motivated 
by applications. 



2.2. The Algorithm 

Our goal is now to sample from (2.9) with x € X N . As explained in the introduction, we use a RWM proposal 
with covariance operator 2^-C on H given by (1.11). The noise £ is finite dimensional and is independent 
of x. Hence, even though the Markov chain evolves in W s , x and y in (1.11) differ only in the first N 
coordinates when written in the eigenbasis of C; as a consequence the Markov chain does not move at all 
in H S \P N W S and can be implemented in M. N . However the analysis is cleaner when written in H s . The 
acceptance probability also only depends on the first N coordinates of x and y and has the form 



where 



C~ 1 ' 2 {P N x) 



a{x,£) = 1 Aexp(Q(z,£)) 
2 1 



C-^(P N y) +* N (x)-* N (y). 



The Markov chain for {x k }, k > is then given by 



(2.10) 
(2.11) 



n fc+i 



^+1^+1 



+ (l-j k+1 )x k 



and 



,,fc+i 



!2i 2 ril/2 tfe+i 

N S 



(2.12) 



with 



„fc+l 



V 



Bernoulli ya (x k , £ 



k+i 



N 

)) and = ^ whcr c $ +1 ~ N(0, 1) i.i.d. 



with some initial condition x°. The random variables (; k and x° are independent of one another. Furthermore, 
conditional on a(x k_1 : £ k ), the Bernoulli random variables ^ k are chosen independently of all other sources of 
randomness. This can be seen in the usual way by introducing an i.i.d. sequence of uniform random variables 
Unif [0, 1] and using these for each k to construct the Bernoulli random variable. 

In summary, the Markov chain that we have described in T~L S is, when projected into coordinates {xj}^ =l , 
equivalent to a standard random walk Metropolis method for the Lebesgue density (2.9) with proposal 
variance given by C N on T~L. Recall that the target measure 7r in (1.6) is the invariant measure of the SPDE 
(1.9). Our goal is to obtain an invariancc principle for the continuous intcrpolant (1.2) of the Markov chain 
{x k } started in stationarity: to show weak convergence of z N (t) to the solution z{t) of the SPDE (1.9), as 
the dimension N — > oo. 

In the rest of the section, we will give a heuristic outline of our main argument. The emphasis will be 
on the proof strategy and main ideas. So, we will not yet prove the error bounds, and use the symbol "ps" 
to indicate so. Once the main skeleton is outlined, we retrace our arguments and make them rigorous in 
Sections 3, 4 and 5. 



3 For ease of notation we do not distinguish between a measure and its density 
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2.3. Main Theorem and Implications 

As mentioned earlier for fixed N, the Markov chain evolves in X N C T~L S and we prove the invariancc principle 
for the Markov chain in the Hilbcrt space H s as N goes to infinity. Define the constant /3 

(3 = 2<P(- £/V2) (2.13) 

where 3> denotes the CDF of the standard normal distribution. Note that with this definition of /?, the time 
scale h(£) appearing in (1.9), and defined in (1.4), is given by h(£) = £ 2 /3. The following is the main result of 
this article (it is stated precisely, with conditions, as Theorem 3.6): 

MAIN THEOREM: Let the initial condition x° of the RWM algorithm be such that x° £ n N and let z N (t) 
be a piecewise linear, continuous interpolant of the RWM algorithm (2.12) as defined in (1.2). Then z N (t) 

converges weakly in C([0,T],'H s ) to the diffusion process z(t) given by Equation (1.9) with z(0) ~ n. 
We will now explain the following two important implications of this result: 

• it demonstrates that, in stationarity, the work required to explore the invariant measure scales as O(N); 

• it demonstrates that the speed at which the invariant measure is explored, again in stationarity, is 
maximized by tuning the average acceptance probability to 0.234. 

The first implication follows from (1.2) since this shows that O(N) steps of the Markov chain (2.12) 
are required for z N (t) to approximate z(t) on a time interval [0, T] long enough for z(t) to have explored 
its invariant measure. The second implication follows from Equation (1.9) for z{t) itself. The maximum 
of the time-scale h{£) over the parameter £ (see [RGG97]) occurs at a universal acceptance probability 
of f3 = 0.234, to three decimal places. Thus, remarkably, the optimal acceptance probability identified in 
[RGG97] for product measures, is also optimal for the non-product measures studied in this paper. 



2-4- Proof Strategy 

Let Tk denote the sigma algebra generated by {x",£ n ,7 fc , n < k}. We denote the conditional expectations 
E( • \Fk) by Efe( • ). We first compute the one-step expected drift of the Markov chain {x k }. For notational 
convenience let x° = x and = £. We set £° = and 7 = 0. Then, under the assumptions on VP, *$> N 
given in Section 3.1, we prove the following proposition estimating the mean one-step drift and diffusion. 
The proof is given in Sections 5.2 - 5.3. 

Proposition 2.1. Let Assumptions 3.1 and 3.4 (below) hold. Let {x k } be the RWM Markov chain with 



x ~ ir N . Then 



NEoix 1 -x) = -£ 2 [3(P N x + C N V* N (x)) + r N , (2.14) 
N E [(a; 1 - x) <g> {x 1 - x)} = 2£ 2 (5 C N + E N (2.15) 

where the error terms r N and E N satisfy \\r N f s -> 0, ^=1 \{<t>i, E N <t>i) s\ -> andW N \{<j) i ,E N <j)j) s \ - 
as N — > 00, for any pair of indices i,j and for s appearing in Assumptions 3.1. 

Thus the discrete time Markov chain {x k } obtained by the successive accepted samples of the RWM 
algorithm has approximately the expected drift and covariance structure of the SPDE (1.9). It is also crucial 
to our subsequent argument involving the martingale central limit theorem that the error terms r N and E N 
converge to zero in the Hilbert space H. s norm and inner-product as stated. 

With this in hand, we need to establish the appropriate invariancc principle to show that the dynamics 
of the Markov chain {x k }, when seen as the values of a continuous time process on a time mesh with steps 
of 0(1/N), converges weakly to the law of the SPDE given in Equation (1.9) on C([0, T],rl s ). To this end 
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we define, for k > 0, 



m N { . j ^ . ) + C Ar V* JV ( • ) I** 1 '" d = f yj^(x k+1 - x k - E k (x k+1 - x k )) (2.16) 
fc+1 , w ig NEk ( x k+i _ x k^ + e 2p(pN x k + c N W^ N (x k )) (2.17) 



r 



E k+l,N*g NK 



{x k+1 - x k ) ® (x^ 1 - x fe ) - 2£ 2 ^ C*^ (2.18) 



with E°' N ,T°' N ,r°' N = 0. Notice that for fixed TV, {r k ' N } k >i,{E k ' N } k >i are, since a; - tt n , stationary 
sequences. 

By definition, 



x k+1 = x k + E k (x k+1 - x k ) + r fe+i,JV _ (2 ig) 

From (2.14) in Proposition 2.1, for large enough TV, 

X M ~ x k_ ^(P^ fc + C N V* N (x k )) + T k+1 > N 

= x k_^ m N {x k ) + ^^ T k + l,N^ (22Q) 

From the definition of Y h,N in (2.16), and from (2.15) in Proposition 2.1, 

E k (T k+1 ' N ) = and E fe (r fc+1 ^ ® r fe+1 ^) « . 

Therefore for large enough TV, Equation (2.20) 'resembles' the Euler scheme for simulating the finite di- 
mensional approximation of the SPDE (1.9) on R w , with drift function m N ( ■ ) and covariance operator 
C N : 

def 1 



x k+i M x k _ m A r ( a .fe)At + y/2h{£)At T k+1 ' N where At = 

This is the key idea underlying our main result (Theorem 3.6): the Markov chain (2.12) looks like a weak 
Euler approximation of (1.9). 

Note that there is an important difference in analyzing the weak convergence from the traditional Euler 
scheme. In our case for any fixed TV £ N, T k ' N £ X N is finite dimensional, but clearly the dimension of 

T k ' N grows with TV. Also the distribution of the initial condition x(0) ~ ir N changes with TV, unlike the 
case of the traditional Euler scheme where the distribution of x(0) does not change with TV. Moreover, for 
any fixed TV, the "noise" process {r fc,Ar } are not formed of independent random variables. However they are 
identically distributed (a stationary sequence) because the Metropolis algorithm preserves stationarity. To 
obtain an invariance principle, we first use a version of the martingale central limit theorem (Proposition 4.1) 
to show that the noise process {r fc,Ar } when rescaled and summed converges weakly to a Brownian motion 
on C([0,T],"H s ) with covariance operator C s , for any T = 0(1). We then use continuity of an appropriate 
Ito map to deduce the desired result. 

Before we proceed, we introduce some notation. Fix T > 0, and define: 

k 

At = l/N, t k = kAt, r) k < N = VAlJ2 Tl ' N ( 2 - 21 ) 

and 

W N (t) + Nt ~W T im i+ i,N te[QTl (2-22 ) 

Vtv 



Let W(t),t £ [0,T] be an H s valued Brownian motion with covariance operator C s . Using a martingale 
central limit theorem, we will prove the following in section 4: 
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Proposition 2.2. Let Assumptions 3.1 (below) hold. Let x° ~ tt n . The process W N (t) defined in Equation 
(2.22) converges weakly to W in C([0, T], T-L s ) as N tends to oo, where W is a Brownian motion in time 
with covariance operator C s in H s and s is defined in Assumptions 3.1. Furthermore the pair (x° ,W N (t)) 
converges weakly to (z°,W) where z a ~ it and Brownian motion W is independent of the initial condition 
z° almost surely. 

Using this invariance principle for the noise process and the fact that the noise process is additive (the 
diffusion coefficient is constant), the invariance principle for the Markov chain follows from a continuous 
mapping argument which we now outline. For any (z , W) £ W s x C([0,T];W s ) we define the Ito map 
e-.n s x C([0,T];H") -> C([0,T];H 8 ) by 6 : (z°,W) i-> z where z solves 

z(t) = z° - h(t) J (z(s) + CV*(z(s))) ds + y/2h(£) W(t) (2.23) 

for all t £ [0,T] and h(£) = £ 2 (3 is as defined in (1.4). Thus z = Q(z°,W) solves the SPDE (1.9) with 
h(£) = £ 2 /3. We will see in Lemma 3.7 that 9 is a continuous map from H s x C([0, T];H") into C([0, T];H S ). 
We now define the piecewise constant interpolant of x k : 

z N (t)=x k for t£[t k ,t k+1 ). (2.24) 

Set 

d N {x)=NE (x L -x). (2.25) 

Note that cIn(x) ~ — /i(^)mjv(x). We can use z N to construct a continuous piecewise linear interpolant of x k 
by defining 

z N (t) = z°+ f d N {z N (s))ds + ^2h(£)W N (t). (2.26) 
Jo 

Notice that d N (x) defined in (2.25) is a function which depends on arbitrary x = x° and averages out the 
randomness in x 1 conditional on fixing x = x . We may then evaluate this function at any x £ W s and, in 
particular, at z N (s) as in (2.26). Use of the stationarity of the sequence x k , together with Equations (2.19), 
(2.21) and (2.22) reveals that the definition (2.26) coincides with that given in (1.2). Using the closeness of 
d N and —h(£)m N , of z N and z N and of m N and the desired limiting drift, we will see that there exists a 
W N W as N -> oo, such that 

z N (t) =z°- h{£) j [z N {s) + CW ds + v / 2h(£)W N {t), (2.27) 

so that z N = &(z°, W N ). By the continuity of O we will show, using the continuous mapping theorem, that 

z N = Q(z°,W N ) z = G(z°,W) as N -> oo. (2.28) 

It will be important to show that the weak limit of (z°, W N ), namely (z°, W), comprises of two independent 
random variables z° (from the stationary distribution) and W . 

The weak convergence in (2.28) is the principal result of this article and is stated precisely in Theorem 
3.6. To summarize, we have argued that the RWM is well approximated by an Eulcr approximation of (1.9). 
The Euler approximation itself can be seen as an approximate solution of (1.9) with a modified Brownian 
motion. As N — > oo, all approximation errors go to zero in the appropriate sense and one deduces that the 
RWM algorithm converges to the solution of (1.9). 

2.5. A Framework for Expected Drift and Diffusion 

We now turn to the question of how the RWM algorithm produces the appropriate drift and covariance 
encapsulated in Proposition 2.1. This result, which shows that the algorithm (approximately) performs a 
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noisy steepest ascent process, is at the heart of why the Metropolis algorithm works. In the rest of this 
section we set up a framework which will be used for deriving the expected drift and diffusion terms. 
Recall the setup from Section 2. Starting from (2.11), after some algebra we obtain: 



Q(x,Q = -yT(C,0 - yCll 2 - r(x,0 (2.29) 

where we have defined 

C = C- 1 ' 2 {P N x) + C 1 ' 2 W N (x) (2.30) 

r(a?,£) = V N (y) - V N (x) - (W N {x), P N y - P N x) . (2.31) 

Remark 2.3. // x ~ ttq in 'H s , then the random variable C^ 1 ^ 2 x is not well defined in % s because C~ 1//2 
is not a trace class operator. However Equation (2.30) is still well defined because the operator C^ 1 ! 2 acts 
only in X N for any fixed N . 

Notice that C^C, is approximately the drift term in the SPDE (1.9) and this plays a key role in obtaining 
the mean drift from the accept/reject mechanism; this point is elaborated on in the arguments leading up 
to (2.45). By (3.5) and Assumptions 3.1, 3.4 on ^ and ^ N below, we will obtain a global bound on the 
remainder term of the form, 

\r(x,0\ < M& HC 1 /^. (2 32) 

Because of our assumptions on C in (3.1), the moments of HC* 1 / 2 ^ 2 stay uniformly bounded as N — > oo. 

Hence we will neglect this term to explain the heuristic ideas. Since £ = J2iLi ^i4>i with & ~ N(0, 1), we find 
that for fixed x, 

Q(x,0«N(-^ 2 ,2£ 2 M!) (2.33) 

for large N (see Lemma 5.1). Since x ~ 7r, we have that C~ 1 ^ 2 (P N x) = J2k=i Pj^ji where pj arc i.i.d. N(0, 1). 
Much as with the term r(x, £) above, the second term in expression (2.30) for £ can be seen as a perturbation 
term which is small in magnitude compared to the first term in (2.30) as — s- oo. Thus, as shown in Lemma 
5.2, we have ||CI| 2 /^ — ► 1 f° r 7r-a.e. f as N — > oo. Returning to (2.33), this suggests that it is reasonable, for 
N sufficiently large to make the approximation 

Q(x,£) ^~N(- e 2 ,2£ 2 ), ir a.s. (2.34) 

Much of this section is concerned with understanding the behavior of one step of the RWM algorithm if we 
make the approximation in (2.34). Once this is understood, we will retrace our steps being more careful to 
control the approximation error leading to (2.34). 

The following lemma concerning normal random variables will be critical to identifying the source of the 
observed drift. It gives us the relation between the constants in the expected drift and diffusion coefficients 
which ensures ir invariance, as will be seen later in this section. 

Lemma 2.4. Let Z e ~ N(-£ 2 ,2e 2 ). Then ¥(Z e > 0) = E,(e Ze l Ze< o) = t/V%) and 

E(lAe^) = 2$(-£/V2) =(3. (2.35) 

Furthermore, if ' z ~ N(0, 1) then 



E 



z(lAe a2+ j] =acxp(a 2 /2 + fe)$(- A - | a |) (2.36) 



for any real constants a and b. 

Proof. A straightforward calculation. See Lemma 2 in [BRS09]. □ 

The calculations of the expected one step drift and diffusion needed to prove Proposition 2.1 are long 
and technical. In order to enhance the readability, in the next two sections we outline our proof strategy 
emphasizing the key calculations. 
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2. 6. Heuristic Argument for the Expected Drift 

In this section, we will give heuristic arguments which underly (2.14) from Proposition 2.1. Recall that 
{(pi, 02, ■ ■ • } is an orthonormal basis for H. Let x\,i < N denote the coordinate of x k and C N denote 
the covariance operator on X N , the span of {(pi, <p2, ■ • • ,4>n}- Also recall that Th denotes the sigma algebra 
generated by {x n , £™, 7™, n < k} and the conditional expectations E(-|.Ffc) are denoted by • ). Thus Eq(-) 
denotes the expectation with respect to and 7 1 with x° fixed. Also for notational convenience set x° = x 
and £ x = £. Letting Eg denote the expectation with respect to £, it follows that 

N E (xl -x°) = NE (7 1 (yl - a*)) = N E« (a(x, £) ^{C 1 ' 2 ^) 

= \iV2PNE 6 (a(x,O&) =A i y2Wv4((lAe«^)( I ) . (2.37) 

To approximately evaluate (2.37) using Lemma 2.4, it is easier to first factor Q{x,t;) into components 
involving & and those orthogonal (under Eq) to them. To this end we introduce the following terms: 



N „ 2 N 

2 

3=1 i=i 



i?i(x,0 



Hence for large N (see Lemma 5.5), 



N 

'ML \^ c .t. 



Q(x, C)=i2(x, 0-^,0 = ^0 



■AT' 



N 



N 



(2.38) 
(2.39) 

(2.40) 



The important observation here is that conditional on x, the random variable Ri(x,^) is independent of 
Hence the expectation Eq^(1 A eQ( x '^)£i) can be computed by first computing it over £j and then over 

£ \ Let E^ , E^ denote the expectation with respect to £ \ respectively. Using the relation (2.40), 
and applying Equation (2.36) with a = — yHrCii z — & an d b = we obtain (see Lemma 5.6) 



E«f(lA e ^))6 



-Ri(x,S) 



N Iv 



(2.41) 



Now again from the relation (2.40) and the approximation Q(x, £) encapsulated in (2.33), it follows that for 
sufficiently large TV 



Ri(x,£) f« N(-£ 2 ,2^ 2 ), 7T a.s. 



(2.42) 



Combining (2.41) with the fact that, for large enough N, $(— -Ri(a;, £)/y ^f-|CiI) ~ !«<(:<:,£) <0> we see t na t 
Lemma 2.4 implies that (see Lemmas 5.7 - 5.10) 



Eq T e H i( ^« + #C? $(-^) « Ef (>' <*>1 R ,,.,) 

wEe z 'l z ,< =j8/2 



(2.43) 
(2.44) 
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where Z t ~ N(-£ 2 ,2£ 2 ). Hence from (2.37), (2.41) and (2.44), we gather that for large N, 

To identify the drift observe that since C -1 / 2 is self adjoint and i < N, we have AjC -1 ' 2 ^ = </>j and 

= A 4 (C- 1 /2(P^ a; ) + C^v^x), cfi) = \i(C-^(P N x) + C-V 2 CVV N (x),<t>i) 

= (P N x + C N W N (x),&). (2.45) 

Hence for large enough N, we deduce that (heuristically), the expected drift in the i coordinate after one 
step of the Markov chain {x k } is well approximated by the expression 

NEo(x} - xj) m -e 2 /3(P N x + C N W N (x)) r 

This is an approximation of the drift term that appears in the SPDE (1.9). Therefore the above heuristic 
arguments show how the Metropolis algorithm achieves the 'change of measure' by mapping ttq to it. The 
above arguments can be made rigorous by quantitatively controlling the errors made. In Section 5, we 
quantify the size of the neglected terms and quantify the rate at which Q is well approximated by a Gaussian 
distribution. Using these estimates, in Section 5.2 we will retrace the arguments of this section paying 
attention to the cumulative error, thereby proving (2.14) of Proposition 2.1. 



2. 7. Heuristic Argument for the Expected Diffusion Coefficient 

We now give the heuristic arguments for the expected diffusion coefficient, after one step of the Markov 
chain {x k }. The arguments used here are much simpler than the drift calculations. The strategy is the same 
as in the drift case except that now we consider the covariance between two coordinates x\ and Xj. For 

1 < »,J < JV, 



A r E ( 



2i 2 Eg 



(Vi -Xi){y) -xj)a(x,£) 
(Vi - Xi){y) - Xj) (l A cxpQ(a;,C) 
(C^ZUC^OjilAexpQix,® 



Now notice that 



Eg 



XiXjdi 



where Sij = l; = j. Similar to the calculations used when evaluating the expected drift, we define 



(2.46) 



N 



N 



(2.47) 



and observe that 

Hence for sufficiently large N we have Q(x, £) rj Rij(x,£). By replacing Q(x,£) in Equation (2.46) by 
Rij(x, £) we can take advantage of the fact that Rij(x, £) is conditionally independent of £j, £j. However the 
additional error term introduced is easy to estimate because the function fix) = (1 A e x ) is 1-Lipschitz. So, 
for large enough N (Lemma 5.12), 



(C 1/2 0<(C 1/2 0i(l A ex P Q(x,0)] « ^i[{C 1/2 Oi{C X/2 Oo (l A exp 



= A 4 Xj Sij E W ( 1 A exp R XJ (x, £) 
14 



(2.48) 



Again as in the drift calculation we have that 

So by the dominated convergence theorem and Lemma 2.4 

lim |Yl A exp 



7r-a.s. 



(2.49) 



Therefore for large TV, 



A^E (x\ - x°)(x) - x]) sa 2£ 2 /3A i A J <% 



or in other words 



NE (x 1 



x°) ® (x 1 - x°) «2£ 2 /3C 



As with the drift calculations in the last section, these calculations can be made rigorous by tracking the size 
of the neglected terms and quantifying the rate at which Q is approximated by the appropriate Gaussian. 
We will substantiate these arguments Section 5.3. 

3. Main Theorem 

In this section we state the assumptions we make on ttq and VP and then prove our main theorem. 
3.1. Assumptions on and C 

The assumptions we make now concern (i) the rate of decay of the standard deviations in the prior or 
reference measure ttq, and (ii) the properties of the Radon-Nikodym derivative (likelihood function). These 
assumptions are naturally linked: in order for 7r to be well-defined we require that ^ is ttq— measurable and 
this can be achieved by ensuring that ^ is continuous on a space which has full measure under ttq. In fact, 
in a wide range of applications, is Lipschitz on such a space [StulO]. In this paper we require, in addition, 
that ^ be twice diffcrentiable, in order to define the diffusion limit. This, too, may be established in many 
applications. To avoid technicalities we assume that ty(x) is quadratically bounded, with first derivative 
linearly bounded and second derivative globally bounded. A simple example of a function VP satisfying the 
above assumptions is &(x) = \\x\\ 2 . 

Assumptions 3.1. The operator C and functional ^ satisfy the following: 

1. Decay of Eigenvalues Xf of C: There exist M_,M+ € (0, oo) and k > \ such that 



2. 



M- < i K \ t < M+, Vi e Z+ . 
Assumptions on \E': There exist constants Mi £ R, i < 4 and s £ [0,k— 1/2) such that 



(3.1) 




(3.2) 



(3.3) 
(3.4) 



\\d 2 ^(x)\\ c{HsM ^ ) <M 4 VxeH*. 



Notice also that the above assumptions on ^ imply that for all x, y € H s , 




(3.5a) 

(3.5b) 
(3.5c) 



\Jr(y) = *(x) + (V*(a;), y — x) + rem(x, y) 
rem(a;,y) < M 6 \\x - y\\ 2 s 



for some constants M5, Mq £ K + . 
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Remark 3.2. The condition k > % ensures that the covariance operator for no is trace class. In fact the 
T-L r norm of a realization of a Gaussian measure N(0, C) defined on T-L is almost surely finite if and only if 
r < k— | [DPZ92]. Thus the choice of Sobolev space T-L s , with s £ [0, K — i) in which we state the assumptions 
on ^ is made to ensure that the Radon-Nikodym derivative of ir with respect to tt is well-defined. Indeed, 
under our assumptions, is Lipschitz continuous on a set of full 7r measure; it is hence ttq— measurable. 
Weaker growth assumptions on "J , its Lipschitz constant and second derivative could be dealt with by use of 
stopping time arguments. 

The following lemma will be used repeatedly. 

Lemma 3.3. Under Assumptions 3.1 it follows that, for all o£l, 

||C a a;|| x \\x\\- 2Ka . (3.6) 

Furthermore the function CV^ : H s — > T~i s is globally Lipschitz. 

Proof. The first result follows from the inequality 



||C a x|| 2 = ]T \fx) < M.^j = M, 



\x\\ 2 



2ftai 



and a similar lower bound, using (3.1). To prove the global Lipschitz property we first note that 

V#(ui) - V#(u 2 ) = K{ui - U2) := / d 2 ^(tu 1 + (l-t)u 2 )dt(u 1 -u 2 ). (3.7) 

Jo 

Note that \\K\\ C(HS yn - s) < M 4 by (3.4). Thus, 

||C(V*(ui) - V*(ua))||. < MWC^^Kim - u 2 )\\ 

< M\\C l - s/2K KC s/2k C- s/2k {u 1 - ua)|| 

< M||C 1 -'/ a "|| £( «-. i « ) ||lf||^. i «-. ) ||C''/ 2fc || £( « iW) ||« 1 - u 2 \\ s . 

The three linear operators are bounded between the appropriate spaces, in the case of C x ~ s I 2k by using the 
fact that s < n — \ implies s < k. □ 



3.2. Finite Dimensional Approximation of the Invariant Distribution 

For simplicity we assume throughout this paper that ^> N (■) = $>(P N -). We note again that \7'& N (x) = 
P N W(P N x) and d 2 ^ N (x) = P N d 2 ^(P N x)P N . Other approximations could be handled similarly. The 
function ^f N may be shown to satisfy the following: 

Assumptions 3.4. Assumptions on $ N : The functions , i> N : X N H> R satisfy the same conditions 
imposed on ^ given by Equations (3. 2), (3. 3) and (3.4) with the same constants uniformly in N . 

It is straightforward to show that the above assumptions on ty N imply that the sequence of measures {tt n } 
converges to ir in the Hellinger metric (see [CDS]). Therefore the measures {tt n } are good candidates for 
finite dimensional approximations of ir. Furthermore, the normalizing constants M-^n are uniformly bounded 
and we use this fact to obtain uniform bounds on moments of functionals in H under tt n : 

Lemma 3.5. Under the Assumptions 3.4 on ^> N , 

sup MyN < oo 

NeN 

and for any measurable functional f : H i— > R, and any p > 1, 

supE wJV |/(a;)| p < MW a \f{x)\ p . (3.8) 

NEN 



1G 



Proof. By definition, 

M~ l N = f cM-y N (x)}Mdx) > [ exjp{-M(l+\\x\\ 2 s )}ir (dx) > e~ 2M P*°(\\x\\ s < 1) 

JH JU 

and therefore if inf{M~^ : N G N} > then sup{A/^,« : N G N} < oo. Hence for any / 



supE" < supM $N E"°(e-* (a:) |/(a;)H < MW°\f(x)\ p 

NGN NeN 

proving the lemma. □ 
The uniform estimate given in (3.8) will be used repeatedly in the sequel. 

3.3. Statement and Proof of the Main Theorem 

The assumptions made above allow us to fully state the main result of this article, as outlined in Section 2.4. 

Theorem 3.6. Let the Assumptions 3.1, 3.4 hold. Let the initial condition x° of the RWM algorithm be such 
that x° 2 ir N and let z N (t) be a piecewise linear, continuous interpolant of the RWM algorithm (2.12) as 
defined in (1.2). Then z N (t) converges weakly in C([0,T],W s ) to the diffusion process z(t) given by Equation 

(1.9) with z(0) 2 7T. 

Throughout the remainder of the paper we assume that Assumptions 3.1, 3.4 hold, without explicitly 
stating this fact. The proof of Theorem 3.6 is given below and relies on Proposition 2.1 stated above and 
proved in Section 5, Proposition 2.2 stated above and proved in Section 4, and Lemma 3.7 which we now 
state and then prove at the end of this section. 

Lemma 3.7. Fix anyT > 0, any z° G W and any W G C([0, T], 71 s ). Then the integral Equation (2.23) has 
a unique solution z G C([0,T],"H s ). Furthermore z = Q(z°,W) where Q: H s x C([0,T];"H s ) -> C([0, T];H S ) 
as defined in (2.23) is continuous. 

Proof of Theorem 3.6. We begin by tracking the error in the Euler approximation argument. As before let 
x° rt ir N and assume x(0) = x°. Returning to (2.19), using the definitions from (2.16) and Proposition 2.1, 
produces 



x k+1 = x k + E k {x k+1 - x k ) + yj^- T k+1 - N (3.9) 
x k+1 = x k + ^d N (x k ) + T k+1 > N (3.10) 

= x k - ^m N (x k ) + ^ T k ^ N + L^- (3.11) 

where d N (■) is defined as in (2.25) and r k+1 ' N as in (2.17). By construction E k (r k+1 ' N ) = and 

E k (T k+1 ' N ® T^ 1 ^) = JjL [E k {{x k+1 - x k ) ® {x k+1 - x k )) - E k (x k+1 - x k ) ® E k (x k+1 - x k )] 

E k (x k+1 -x k )®E k (x k+1 ~x k )] (3.12) 



1 _ t I 1 TKT N 



2£ 2 /3 2Pj3 

where E k+1 ' N is as given in (2.18). 

Recall t k given by (2.21) and W , the linear interpolant of a correctly scaled sum of the T k ' N , given by 
(2.22). We now define W N so that (2.27) holds as stated and hence Q(W N ) = z N . Define 

rf (s) =£ 2 f3(z N (s) + CV*(z N (s)) - m N (z N (s))) 
17 



where r k+1 - N '(•) is given by (2.17), m N is from (2.16), z N from (2.24), and z N from (2.26). If 

W N {t) = W N {t) + (l/^2Pp)e N (t) 
with e N (t) = f* (rf (u) + rf (u)) du, then (2.27) holds. To see this observe from (2.26) that 



t N (t) = z°+ / d N {z N {u))du+^WpW N {t) 
Jo 

= z°-e 2 /3[ m N (z N (u))du+ [ r^{s)ds + y / 2PpW N (t) 



z° 



z° 



£ 2 (3^ (z N (u) + CV$(z N (u))^ du + J^ (r?(s)+r?(s)y s + ^2PpW N (t) 
£ 2 (3^ (z N (u) + CV$(z N (u))^ du+^/WfSW N {t) 



and hence, with this definition of W N , (2.27) holds. 
Furthermore, we claim that 



N^oo ^te[0,T] 

To prove this notice that, 



Also 



lim E* N ( sup \\e N (t)\\l) = 0. (3.13) 



sup \\e N {t)f s <M( sup C \\rf{u)tdu+ sup /' \\r» (u)\\ 2 s du) . 

te[0,T] v t6[0,T]J0 *G[0,T]Jo ' 

E"" sup / ||rf(u)||^ u <E 7rN / ||rf(tt)||2dti 
te[o,T]Jo Jo 

1 w 

< M j-W N \\ rk ' N \\l = ME^Hr 1 ^^ N -^° 



fc=i 

where we used stationarity of r k,N and (2.14) from Proposition 2.1 in the last step. We now estimate the 
second term similarly to complete the proof. Recall that the function z z + CV$(z) is Lipschitz on T-L s 
by Lemma 3.3. Note also that C N V^ N (-) = C P N V ^ {P N ■) . Thus 

(u)||a < M\\z N {u) - P N z N {u)\\ s + \\C(I - P N )W(P N z N (u)\\ s 

< M(\\z N (u) - z N (u)\\ s + P N )z N (u)\\ s ) + \\(I-P N )CV*(P N z N (u)\\ s . 

But for any u € [t k ,t k+1 ), we have 

\\z N (u) - z N (u)\\ s < \\x k+1 - x k \\ s < \\y k+1 - x k \\ s . 

This follows from the fact that z N (it) — x k and z N (u) = ((u—t k )x k+1 + (t k+1 -u)x k ^j , because x k+1 -x k = 
7 fc+1 (y fe+1 - x k ) and |7 fc+1 | < 1. For u G [t k ,t k+1 ), we also have 

\\(P N - I)z N (u)\\ s = \\(P N I)x k \\ s = \\(P N I)x°\\ s , 

because x k is not updated in 'H S \X N , and 

|| (P N - I)CW(P N z N (u))\\ s = \\(P N - I)CVV{P N x k )\\ s . 
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Hence we have by stationarity that, for all u <S [0, T], 

® v \\r 2 (u)\\l < ME^Hy 1 - x°\\l + MW(\\(P N - I)x°\\ 2 s + \\(P N - I)CW(P N x°)\\ 2 s ) . 

Equation (2.12) shows that WWy 1 - < MN~ l . The definition of P N gives W\\{P N - I)x\\ 2 s < 

N-ir-^WxW 2 . for any r € (s,k- 1/2). Note that E^lx ^ is finite for r <E (s, k - 1/2) by Lemma 3.5 
and the properties of ttq. Similarly we have that for r < 2k — s < k + ^ 

E\\CV^(P N x°)\\ 2 r < ME\\C 1 ~ ( - r+s)/2K \\ c{nw \\^(P N x°)\\ 2 _ s < ME(1 + \\x°\\ 2 ). 

Hence we deduce that E OT \\r^ (m) || ^ — > uniformly for u e [0, T\, It follows that 

JV/„ ,M|2 , . ^ tvit n / \\ Mf. .M|2 . . ^ / ic-Tr^ii Ni 



N 



W sup / \\r»(u)\\idu<W / \\r»{u)\\idu< / |K(u)||>->0 
te[o,T]Jo Jo 

and we have proved the claim concerning e N made in (3.13). 

The proof concludes with a straightforward application of the continuous mapping theorem. Let W 
W N + ~/=p^ eN ■ Let 51 denote the probability space generating the Markov chain in stationarity. We have 

shown that e N -> in L 2 (p,; C([0, T],H S )) and by Proposition 2.2, W N converges weakly to W a Brownian 
motion with covariance operator C s in C([0,T],H s ). Furthermore we also have that W is independent of 
z°. Thus {z°,W N ) converges weakly to (z°,W) in U s x C([0, T], U s ), with z° and W independent. Notice 
that z N = Q(z°,W N ), where is defined as in Lemma 3.7. Since is a continuous map by Lemma 3.7, 
we deduce from the continuous mapping theorem that the process z N converges weakly in C([0, T], W) to 
z with law given by 0(z°, W). Since W is independent of z°, this is precisely the law of the SPDE given by 
(1.9). □ 

Proof of Lemma 3. 7. Consider the mapping z^ i-> z( n+1 ) defined by 

z {n+1 \t) = z°- h(£) J (z^(s) + CV*(z (n »(s))) ds + ^2h(e)W(t) 

for arbitrary z° e % and W e C([0, T]; H s ). Recall from Lemma 3.3 that 2 i-> z + CV^(z) is globally 
Lipschitz on H s . It is then a straightforward application of the contraction mapping theorem to show that 
this mapping has a unique fixed point in C([0, T]; H s ), for T sufficiently small. Repeated application of the 
same idea extends this existence and uniqueness result to arbitrary time-intervals. Let Zj solve (2.23) with 
(z°,W) — (wi,Wi),i = 1,2. Subtracting the two equations and using the fact that z M> z + CVf (z) is 
globally Lipschitz on H. s gives 



\\zi(t) - z 2 (t)|| a <\\w!- w 2 \\ s + M I \\ Zl (s) - z 2 {s)\\ s ds + y/Wfi \\W x {t) - W 2 {t)\\ 

Thus 



sup \\ Zl {t) - z 2 (t)\\ s < \\ Wl - W2 \\ s + M / sup \\z 1 (T)-z 2 (T)\\ s ds + JWfi sup \\Wi(t)-W 2 (t)\\ s . 

0<t<T Jo 0<r<s 0<t<T 

The Gronwall lemma gives continuity in the desired spaces. □ 



4. Weak convergence of the Noise Process: Proof of Proposition 2.2 

Throughout we make the standing Assumptions 3.1, 3.4 without explicit mention. The proof of Proposi- 
tion 2.2 uses the following result concerning triangular martingale increment arrays. The result is similar to 
the classical results on triangular arrays of independent increments. 

Let fcjv : [0,T] — > Z+ be a sequence of nondecreasing, right-continuous functions indexed by N with 
fc/v(0) = and /cat(T) > 1. Let {M k > N , -P k ' N }o<k<k N (T) be an W 3 valued martingale difference array. That 
is, for k = l,...,k N (T), we have E{M k ' N \ J*- T > N ) = 0, E(||M fe > JV ||2|.F* : - 1 > iV ) < 00 almost surely, and 
■pk-i,N (- jrk,N _ w jvj ma k e use f the following result: 

19 



Proposition 4.1 ([Ber86], Proposition 5.1). Let S : H !i — > % s be a self- adjoint, positive definite, operator 
with finite trace. Assume that, for all x £ H 8 , e > and t £ [0, T], the following limits hold in probability: 



k N (T) 

lim HUt k - N \\ 2 s \F k - UN )=Tt™cc(S), (4.1) 

fc=l 
fe N (t) 

lim V E((M k > N ,x) 2 s \F k - 1 > N ) = t(Sx,x) s , (4.2) 

N— foo * — * 

fc=l 

fc«(T) 

lim ^ E((M k < N ,x) 2 s l l{Mk , N , x)sl >JF k - 1 > N ) = 0. (4.3) 

fc=l 

Define a continuous time process W N by W N (t) = JZfc=i M fe,JV if k^it) > 1 and fcAr(i) > lim r ^ 0+ &;v(* — 
r), and by linear interpolation otherwise. Then the sequence of random variables W converges weakly in 
C([0,T),H s ) to an H s valued Brownian motion W, with W(0) = 0, E(W(T)) = 0, and with covariance 
operator S . 

Remark 4.2. The first two hypotheses of the above theorem ensure the weak convergence of finite dimensional 
distributions of W N (t) using the martingale central limit theorem in W N ; the last hypothesis is needed to 
verify the tightness of the family {W N ( ■ )}. As noted in [CW98], the second hypothesis (Equation (4.2)j of 
Proposition J^.l is implied by 

k N (t) 

lim V E((M k > N ,e n ) s (M k < N ,e m ) s \T k - 1 > N ) = t(Se n ,e m ) s (4.4) 

Af->oo * — ' 
fc=l 

in probability, where {e„} is any orthonormal basis for 'W . The third hypothesis in Equation (4.3) is implied 
by the Lindeberg type condition: 

lim E(||M fe ^||2l ||Mfc . N||s> JJ-^^) = (4.5) 

fc=l 

in probability, for any fixed e > 0. 

Using Proposition 4.1 we now give the proof of Proposition 2.2. 

Proof of Proposition 2.2. We apply Proposition 4.1 with k N (t) = [Nt\ , M k,N = -^T k ' N and S = C s ; 
the resulting definition of W N (t) from Proposition 4.1 coincides with that given in Equation (2.22). We set 
F k ' N to be the sigma algebra generated by ^}j<k with x° ~ n N . Since the chain is stationary, the noise 
process {r fc-Ar , 1 < k < N} is identically distributed, and so are the errors r k ' N and E k > N from (2.17) and 
(2.18) respectively. We now verify the three hypothesis required to apply Proposition 4.1. We generalize the 
notation E^(-) from subsection 2.6 and set E«(-|J rfc ' JV ) = £<•(•). 

• Condition (4.1): It is enough to show that 



[NT] 



! im E ^|^ E El-idlr^llD-trace^ 



JV->oo I N 

k=l 







and condition (4.1) will follow from Markov's inequality. By (3.12) and (2.2), 

N N 



i C o(!ir 1 ' Ar ||^ = E E o(H i? ' rl ' Ar ll 2 ) = E IE o( rl ' Ar '^^) 
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2 



3=1 



1 W 

trace(Cf) + — ^T^-,^ 



3 = 1 



A 



(4.6) 
(4.7) 



By Proposition 2.1 it follows that E ff | Y^jLiifa i E 1,N fa) s \ 0. For the third term notice that by 
Proposition 2.1 (2.14) we have 



E * N 2W llEo(xl " x0)l1 ' " M^ N (\\m N ( X °)\\ 2 s + \\r^\\D 



m1(e^(i + ||x°|| s ) 2 + E- N \\r 1 ' N \\ 2 s 



(4.8) 







where the second inequality follows from the fact that CV^ is globally Lipschitz in T~L S . Also {E k ' N } 
is a stationary sequence. Therefore 



[NT\ 



k=l 



^Et.dlr^ll^-TtraceCCf) < 

ME^df;^^ 1 ^-).! + ^PoOr 1 - x°)||2) + trace(Cf - T\ -+ 



3=1 



Condition (4.1) now follows from the fact that limAr^oo |trace(C s ) — tracc(C^)| = 0. 

• Condition (4.2): by Remark 4.2, it is enough to verify (4.4). To show (4.4), using stationarity and 
similar arguments used in verifying condition (4.1), it suffices to show that 



lim E 77 

N— >oo 



E^r 1 '", ^.(T 1 '" , fa,) s ) - (fa, C? 4>m) 







(4.9) 



where {fa} is as defined in (2.7). We have 



E 71 



^({T^^.iT^M.) ~ (fa,C? <f m ) 



E«((r 1 ' 7V > n ) s (r 1 ' 7V , <t> m ) s ) - (fa,c» fa,) 



'n;C s ( t ) m)s 



and therefore it in enough to show that 

lim E*" E^((r 1,JV , fa) s (T 1,N , <p m )s) 

iV— »oo 

Indeed we have 

(r 1 ^,^), (r 1 - JV ,0 m ) s - (r^ N ,B s fa) (r 1 ^,^^) = (B a 

and from (3.12) and Proposition 2.1 we obtain 

(fa, bI r 1 ^ ® r 1 -" bI <b m ) s - (fa, cf </> m ) s - (0„, bI r 1 ^ ® r 1 -" £ 



(4.10) 



r^B s fa) 



s Vtn/s Win 



B| C N BI fa 



N 



= n s m s (0„, vS 1 ^^ - ^E ((a; 1 - x°, n ) s )E o (( a ; 1 - a; , m ) s ) . 
From Proposition 2.1, it follows that lini/v_>. 00 E 71 '™ \(fa, E 1,N (j) m ) s \ = 0. Also notice that 

S* W |E ((.T 1 - x°, 0„) s )E o ((x 1 - x°, m ) s )|] 2 < ME*" (Apo^r 1 - .t )||2!|^||2) E^" (AHEo^t 1 - x )!^!^^) 
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by the calculation done in (4.8). Thus Equation (4.10) holds and since \(<fi n , C s <fi m )s~ (4>m 4>m)s\ 
0, Equation (4.2) follows from Markov's inequality. 

• Condition (4.3): from Remark 4.2 it follows that verifying (4.5) suffices to establish (4.3). 
To verify Equation (4.5), notice that for any e > 



, 1 [NTi I ATT I 

E ^l^ E^-idir^ll^inr-ii^}) <^^(||r^||2i {llrl , N|l ,> eAr} )^o 



17V 

by the dominated convergence theorem since limjv^oo E 1 * ||r 1,Ar ||^ = tracc(C s ) < oo. Thus (4.5) is 
verified. 

Thus we have verified all three hypotheses of Proposition 4.1, proving that W N (t) converges weakly to W(t) 
mC([0,T];H s ). 

Recall that X R C T-L s denotes the R dimensional subspace P Rr H. s . To prove the second claim of Proposition 
2.2, we need to show that (x°, W N (t)) converges weakly to (z°, W{t)) in {U a , C([0, T];H S )) as N ->• oo where 
z° ~ ir and 2° is independent of the limiting noise W. For showing this, it is enough to show that for any 
R G N, the pair (x° , P R W N (t)) converges weakly to (z ,Zr) for every t > 0, where Zr is a Gaussian 
random variable on X R with mean zero, covariance tP R C s P R and independent of z°. We will prove this 
above statement as the corollary of the following lemma. 

Lemma 4.3. Let x° ~ ir N and let {8 k ' N } be any stationary martingale sequence adapted to the filtration 
{J rk ' N } and furthermore assume that there exists a stationary sequence {U k ' N } such that for all k > 1 and 
any u E X R , 

1. E? k _ 1 \(u,P R 6 k < N ).\ 2 = (u,P R C.u). + U k < N , lim^ooE^IC/ 1 '^! = . 

2. ^{_x\\0 k ' N \\l < M . 

Then for any t e H s , u e X R , R e N and t > 0, 

lim E* N (e i{t ' x0) ° +i ^^ {u ' pRek,N) °) =E-(e i ^- t ^ pRc ^). (4.11) 



N- 



• do 



Note: Here and in Corollary 4.4, i = 

Proof. We show (4.11) for t = 1, since the calculations are nearly identical for an arbitrary t with minor 
notational changes. Indeed, we have 

^ ( /t, I ^^EL I (»,^» t ^) !)=E /(4_ i( i(t,').^Et 1 (,P s ^% ) ). 
By Taylor expansion, 



E^(Ei r _ 1 (e l(t ' a;0>s+ ^ 5: "-^ pi? ' e ' i ' N>a )) 



E 



(t,*° )s+ ^ ££i>,P**^> 8 ( 1 _ _L E ^ iKm , pW> 8 | 2 + m{^j- 2 V n A 2; 



(4.12) 

where IV^I < E i N _ 1 \(u, P R e N ' N ) s \ 3 < M, since by assumption E^J^^I^ < M. We also have that 

E^I^P^.%1 2 = (u,P R C s u) s + U N > N 
lim W N \U N ' N \ = . 

W— S-oo 
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Thus from (4.12) we deduce that 



E x 



s 



N 



\S N \ < ME" (^|E^h 



1^1) = M 1(E^(|^| + -^04.13) 



Proceeding recursively we obtain, 



E 71 " 



e«(ty>.( 1 _^_ {tt|P « c7 , tt> .^ 



A' 



k=l 



By the stationarity of {U k ' N } and the fact that E*\U k ' N \ -> as N -> oo, from (4.13) it follows that 

N TV 

V |5 fe | < — (E* |tf*| + -=) < MflE* |tf x | + -=)-► 0. 



Thus we have shown that 



N 



result follows from the fact that W 



i(t,x°) E 



i(t,z°), 



-o(l), and the 



, hnishing the proof of Lemma 4.3. 



□ 



As a corollary of Lemma 4.3, we obtain: 



Corollary 4.4. The pair (x°,W N ) converges weakly to (z°, W) in C([0,T];H s ) where W is a Brownian 
motion with covariance operator C s and is independent of z° almost surely. 



Proof. As mentioned before, it is enough to show that for any t S H s , u S X R , R £ N and t > 0, 

hm E - N (e «(t^°> 3 +^ES J <«,P R r^« )s) = p^ijt/l.-iij,,^^). 

JV— >oo 



(4.14) 



Now we verify the conditions of Lemma 4.3 to show (4.14). To verify the first hypothesis of Lemma 4.3, 
notice that from Proposition 2.1 we obtain that for k > 1, 



R r k,N\ |2 _ w £ 



rfe.JV 



E|_ 1 (.B a ii J P K r*' A (8r* ,JV B,«) = (u,P K C s it) s + t/'' 

JiAJV 



where {E k ' N } is as defined in (2.18). Because {L fc,JV } is stationary, we deduce that {U k ' N } is stationary. 
From Proposition 2.1 we obtain 



RAN 



lim V W' \{h,P M E k ' N &) s \ = 

TV— >-oo * — ' 



1,3=1 

andE wN ^\\El_ 1 (x k -x k - 1 )\\ 2 s -> by the calculation done in (4.8). Thus we have shown that E^C/ 1 ^ -> 

as AT -> oo. The second hypothesis of Lemma 4.3 is easily verified since E|_ 1 ||r fc ^ Ar ||^ < ME|_ 1 ||C 1 / 2 ^ fc ||^ < 
M. Thus the corollary follows from Lemma 4.3. □ 

Thus we have shown that (x°, W N ) converges weakly to (z°, W) where W is a Brownian motion in H s 
with covariance operator C s , and by the above corollary we see that W is independent of a; almost surely, 
proving the two claims made in Proposition 2.2 and we are done. □ 



23 



5. Mean Drift and Diffusion: Proof of Proposition 2.1 



To prove this key proposition we make the standing Assumptions 3.1, 3.4 from Section 3.1 without explicit 
statement of this fact within the individual lemmas. We start with several preliminary bounds and then 
consider the drift and diffusion terms respectively. 



5.1. Preliminary Estimates 

Recall the definitions of R(x, £), Ri(x, £), and Rij(x, £) from Equations (2.38), (2.39) and (2.47) respectively. 
These quantities were introduced so that the term in the exponential of the acceptance probability Q(x,£) 
could be replaced with Ri(x,£) and Rij(x,£) to take advantage of the fact that, conditional on x, Ri(x,£) 
is independent of and Rij(x,£) is independent of In the next lemma, we estimate the additional 

error due to this replacement of Q(x, £). Recall that Eq denotes expectation with respect to £ = £ a s in 
Subsection 2.2. 

Lemma 5.1. 

t M 
E«|Q(x,£) - Ri(x,Z)\ a < — (1 + \Q\ 2 ) . (5.1) 

Eg (Q(x, - Rijix, 0) 2 < ^(1 + IGI 2 + 10 1 2 ) ■ (5-2) 
Proof. Since £j are iid N(0, 1), using (2.1) and (3.1), we obtain that 

oo 

E||C 1/2 C||^ < 3(E||C 1/2 £|j 2 ) 2 < M(J2f s ~ 2k f < oo (5.3) 

i=i 

since s < k — \ . 

Starting from (2.40), the estimates in (2.32) and (5.3) imply that 

E*\Q(x,0 - Ri(x,0\ 2 < M(Ei\r(x,0\ 2 + ^oCHi + ^2^) 

< M ( J-EW^Wi + -C 2 + 4o) <M-(l + (?) 

verifying the first part of the lemma. A very similar argument for the second part finishes the proof. □ 

The random variables R(x,£),Ri (x, £) and Rij (x, £) are approximately Gaussian random variables. Indeed 
it can be readily seen that, 

E(x,£)«NM 2 ,2^||C|| 2 ). 

11 ^11 2 

The next lemma contains a crucial observation. We show that the sequence of random variables {^-^} 

converges to 1 almost surely under both ttq and n. Thus R[x, £) converges almost surely to Z t = N(-£ 2 , 2£ 2 ) 
and thus the expected acceptance probability Ea(x,£) = 1 A e^ x '^ converges to (3 = E(l A e z '). 

Lemma 5.2. As N — > 00 we have 

^IICII 2 -> 1, o.s. and ^HCII 2 ^ 1, 7T a.s. (5.4) 



Furthermore, for any m € N ; a > 2, s < k — 5 and /or any c > 0, 

AT 

limsupE^ A? J 2s \Q\ m e c vWiW 2 < 00. (5.5) 



wen j=1 
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Finally, we have 

lim E^(|1-1||C|| 2 | 2 )=0. (5.6) 

Proof. The proof proceeds by showing the conclusions first in the case when x ~ ttq; this is easier because 
the finite dimensional distributions are Gaussian and by Fernique's theorem x has exponential moments. 
Next we notice that the almost sure properties are preserved under the change of measure 7r. To show the 

d N 

convergence of moments, we use our hypothesis that the Radon-Nikodym derivative is bounded from 
above independently of N, as shown in Lemma 3.5, Equation (3.8). 

Indeed, first let x ~ tt . Recall that £ = C~i(P N x) + dW N {x) and 

||V* w (x)||_ s <M 3 (1 + |H| S ). (5.7) 
Using (3.6) and the fact that s < k — | so that — k < — s we deduce that 

< ||V* w (x)||_ a 

< M(l + ||a||.) 

uniformly in N. Also since x is Gaussian under ttq, from Equation (2.4), we may write C^ 1 ^ 2 (P N x) = 
J2k=i Pk4>k, where pk arc i.i.d. N(0, 1). Note that 

^IIC!I 2 = ^I|c^(p^) + ^v^(x)|| 2 

= _L(|| C~ * (P N x) || 2 + 2(C~^(P N x), ciw N (x)) + \\C$V¥ N {x)f} 
= Tr{\\C~ i (P N x)\\ 2 +2(P N x,V^ N (x)) + \\Ci^ N (x)\\ 2 



1 N 



fe=i 



where 



|7l < ^(2||x|| i ||V* w ( a; )||_ 8 + HCiv^C^II 2 ) < ^(2||x||.(l + \\x\\ s ) + (1 + N| s ) 2 ) . (5.9) 

Under ir , we have ||x|| s < oo a.s., for s < K — g, and hence by equation (5.9), we conclude that |7| — » 
almost surely as N — > 00. Now, by the strong law of large numbers, j? ^fc=i Pk ^ almost surely. Hence 
from Equation (5.8) we obtain that under ttq, liniAr^oo ^||CI| 2 = 1 almost surely, proving the first Equation 
in (5.4). Now the second equation in (5.4) follows by noting that almost sure limits are preserved under a 
(absolutely continuous) change of measure. 

Next notice that by Equation (5.8) and Cauchy-Schwartz inequality, for any c > 0: 

( E ^o e ci!lcil 2 )2 < ^o^fEPfeVE^e 207 ) < (V°e 2 ^ £ ol\ Jpo e #IWl!j . 

Using the fact that Ylk=i Pk nas Chi-squared distribution with N degrees of freedom gives 

(ETo e ciHCf)2 < Af e -^ ^E"o e ^||«||2^ < M (5 10 ) 

where the last inequality follows from Fernique's theorem since E 71 ' e^"" 1 " 3 < 00 for sufficiently large N. 
Hence, by applying Lemma 3.5, Equation (3.8), it follows that lim supjy^^, W e c «HCII < 00. Notice that 
we also have the bound 

\c k r<M(\ Pk r + \x k r(i + \\x\n). 
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Since s < k — 1/2 we have that Y^jLi ^jj 2s < 00 an( i therefore it follows that for a > 2, 



TV 



limsup^] (E 7 ^ A^ Q j 2s |Cfe 



2m \ 1/2 



< 00. 



JV-s-oo 



(5.11) 



fe=i 



Hence the claim in (5.5) follows from applying Cauchy-Schwartz combined with (5.10) and (5.11). Similarly, 
a straightforward calculation yields that E 7r °(|l — Trf||CI| 2 | ) < 4f ■ Hence again by Lemma 3.5, 



N 



lim E n (|l 



N 



ICII 



= 



proving the last claim and we are done. 



□ 



Recall that Q(x,£) = R(x,£) — r(x,£). Thus from (2.32) and Lemma 5.1 it follows that Ri(x,£) and 
Rij(x,£) also are approximately Gaussian. Therefore the conclusion of Lemma 5.2 leads to the reasoning 

that, for any fixed realization of x ~ tt, the random variables R(x, £), Ri(x, £) and Rij(x,£) all converge 
to the same weak limit Z( ~ N(—£ 2 ,2£ 2 ) as the dimension of the noise £ goes to 00. In the rest of this 
subsection, we rigorize this argument by deriving a Berry-Essen bound for the weak convergence of R(x, £) 
to Z e . 

For this purpose, it is natural and convenient to obtain these bounds in the Wasserstein metric. Recall 
that the Wasserstein distance between two random variables Wass(X, Y) is defined by 

Wass(X,r)= sup E(f(X)-f(Y)) 
/eLipj 

where hip 1 is the class of 1-Lipschitz functions. The following lemma gives a bound for the Wasserstein 
distance between R(x,£,) and Z(. 

Lemma 5.3. Almost surely with respect to x ~ tt, 



1 N 

Wass(R(x,0,Z e ) < ^(^ElOl 3 



M 



Wasa(R(x,S),Ri(x,S)) < -=(|Ci| + 1) . 

V Jy 



ICII 



N 



(5.12) 
(5.13) 



'' 2 ^2k=i Cfc£fc — i 2 - For any 1-Lipschitz function /, 



Proof. Define the Gaussian random variable G = 

|E«(/(G) - f(R(x,m < ^ 1 - ^E4 2 | < M^= 



N 



k=l 



N 



implying that Wass(G, R(x, £)) < M-^=. Now, from classical Berry-Esseen estimates (see [Str93]), we have 
that 



1 N 

Wass(G, Z t ) < Mj^ Y, 10 1' + M 



3=1 



1 - 



ICII 



N 



Hence the proof of the first claim follows from the triangle inequality. To see the second claim, notice that 
for any 1-Lipschitz function / we have 

E*\f{R(x,0) - f(Ri(x,0)\ < 4|i?(x,0 - Ri(x,0\ < M^=(l + 101) 

V jV 



and we are done. 



□ 



26 



Hence from Equations (5.13) and (5.12), we obtain 

/ i i N urii 2 \ 

Waaa(Ri(x,t), < K,/^^ 1 + 1 1 3 + |1 - ^|). (5.14) 

V 3=1 

We conclude this section with the following observation which will be used later. Recall the Kolmogorov- 
Smirnov (KS) distance between two random variables (W, Z): 

KS(W, Z) d = f sup \¥{W < t) - P(Z < t)\ . ( 5 .i 5 ) 

Lemma 5.4. J/ a random variable Z has a density with respect to the Lebesgue measure, bounded by a 
constant M , then 

KS(W, Z) < y/4MW&ss(W,Z) . (5.16) 

We couldn't find the reference for the above in any published literature, so we include a short proof here 
which was taken from the unpublished lecture notes [Cha07]. 

Proof. Fix t € M. and e > 0. Define two functions g\ and gi as gi(y) = 1 for y E (— oo,t), gi(y) = for 
y £ [t + e, oo) and linear interpolation in between. Similarly, define ,92(2/) = 1, for y € (— 00, t — e], 52(2/) = 0, 
for y G [i,oo) and linear interpolation in between. Then 171 and gi form upper and lower envelopes for the 
function l(_ 0O)t ](y) ■ So 

P(W < *) - V(Z <t)< E gi (W) - E gi (Z) + E. 9l (Z) - P(Z < T). 

Since 51 is i-Lipschitz we have E gi (W) - E gi (Z) < iWass(W, Z) and E gi (Z) - P(Z < t) < Me since Z 
has density bounded by M. Similarly using the function gi it follows that the same bound holds for the 
difference ¥(Z < t) — ¥(W < t). Optimizing over e yields the required bound. □ 

5.2. Rigorous Estimates for the Drift: Proof of Proposition 2.1, Equation (2.14) 

In the following, through a series of lemmas we retrace the arguments from Section 2.6 while deriving explicit 
bounds for the error terms. Lemma 5.11 at the end of the section gives control of the error terms. 

The following lemma shows that Q(x, £) is well approximated by Ri(x, £) — J ~^-Ci£i> as indicated in 
(2.40): 

Lemma 5.5. 

iVEo^ 1 - Xi ) = Aj v^iVE^l A e^'fl-V^^te) + w (i) 
M 



Proof. We have, 

NE (x] - x°) = iVEo(7 (y° - Xi)) = NEi(a(x^)^&{C 1,2 i\) 

= A, V2PNE 6 (a(x,O = A, VWNE^l A e Q ^)& 

Now we observe that 

Eg((l A e^>)&) = ES((1 A e *<™-y/#"<)$ + ■ 
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By (2.32) and (2.40), 



M 



\q{x,z) - Mx,o + Jmcitif < -.(i6i 4 + wc^wt). 



N viVH - N 2 

Noticing that the map y t— > 1 A e y is Lipschitz we obtain 

|wo(i)| < MXiVN^tlUl A e Q l*>®) - (1 A e R ^0-JW^(^ £ 



where the last inequality follows from (5.17) and the lemma is proved. 



(5.17) 



□ 



The next lemma takes advantage of the fact that -Rj(x,£) is independent of £j conditional on x. Thus 
using the identity (2.36) we obtain the bound for the approximation made in (2.41): 



Lemma 5.6. 



■2i- 



§ (lAe 



)6 =-V^C 



AT Si ^0 



JV (j) 



+ wi(i) (5.18) 



Mi)l<M|C ? fie£lKII\ 



Proof. Applying Equation (2.36) with a = —y^fd, z = and & = i?i(x,£), we obtain the identity, 



3((lAe*<*0-V#*C«)ei) 



Hp V jv 



101 



(5.19) 



Now we observe that 



< Eg (e _ VT2 J j=lj j ii'.]?J + ¥^ ) = gTv IKII 

Since 3> is globally Lipschitz, it follows that 



(5.20) 



« $ — 



iV 



JV (j) 



■cJi(i) 



< MIOI^E^ e *C*.0+^ < MlGl^e- 



IIC II 2 



A r 



(5.21) 

where the last estimate follows from (5.20). The lemma follows from (5.19) and (5.20). □ 
The next few lemmas are technical and give quantitative bounds for the approximations in (2.43)-(2.44). 
Lemma 5.7. 



|w 2 (i)| <Me^^ll 2 (|0| + l) 
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'(l + liifoOlvw) 3 



1/4 



Proof. We first prove the following lemma needed for the proof: 



Lemma 5.8. Let </)(•) and <!>(•) denote the pdf and CDF of the standard normal distribution respectively. 
Then we have: 

1. For any i£l, |$(-z) - l x<0 | = |1 - $(|x|)|. 

2. For any x > and e > , 1 - ®(x) < 

Proof. For the first claim notice that if x > 0, |$(— a) - la,<o| = |^( — ar)| = |1 - $(|a;|)|. If x < 0, |$(— x) - 
lx<o| = |1 — ^(|^|)| and the claim follows. 
For the second claim, 



/>oo 

1 - $(x) = / 0(w)du < 

J X 



l x x + e x + e x + e 

since </>(") du = 1. □ 
We now proceed to the proof of Lemma 5.7. By Cauchy- Schwartz and an estimate similar to (5.20), 



MO I < Eo r 



lfli(a;,5)<0 - * 



< 



E «i e 2i? I ( :C ,5)+2VCi 



nl/3 









AT I 1 ' 4 



1/2 



<Me™ ! 



liii(x,5)<o - $ 



\ fie? i /- i / 



1/2 



1/2 



(5.22) 



where the last two observations follow from the computation done in (5.20) and the fact that 



< 1. 



By applying Lemma 5. 8, with e 



(z,€)<o 



v^i|C<l 



< (1 + \/2£|Ci|) 



\ + \Ri(x,£)WN 



(5.23) 



The right hand side of the estimate (5.23) depends on i but we need estimates which are independent of i. 
In the next lemma, we replace Ri{x,^) by R(x, £) and control the extra error term. 

Lemma 5.9. 



e; 



l + \Ri{x,£)\VN 



< Af (1 + IC<|) 



1 



-,1/2 



(i + li?(x,e)iviv) 



(5.24) 



Proof. We write 



3 



^= =E«- 



l + lJZifoOI-s/JV °l + \Ri(x,t)\<</N 
= Ef 1 



i + \r(x,£)\Vn 



< 



(l + \R(x^)\VN) 2 



1/2 



(5.25) 
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<4- 



l + \Ri(x,0\VN l + |i2(a:,0|vJV 
V2£\Cm\ + ^Zi 2 



{l + \Ik(x,Z)\VN)(l + \R(x,€)\VN) 



(i + \r(x,z)\vn) 



<M(|G| + 1) 



En 



1 



'{i + \R(x,t)\VN) 2 



1/2 



and the claim follows from (5.25) and (5.26). 

Now by applying the estimates obtained in (5.22), (5.23) and (5.24), we obtain 



(5.26) 
□ 



|«a(i)| <Me^^\\Q\ + l) 



En 



{l + \R(x,0\VN) 



1/4 



and the lemma is proved. 



□ 



The error estimate in uji has i?(x,£) instead of Ri(x,£). This bound can be achieved because the terms 
Ri{x,£) for all i € N have the same weak limit as i?(a;,£), and thus the additional error term due to the 
replacement of Ri(x,£,) by R(x,^) in the expression can be controlled uniformly over i for large N. 

Lemma 5.10. 



N 



Mi) | < M T7 e — + M 



ICI 



2.1 



AT 



Proof. Set <?(y) = e v l y< Q. We first need to estimate the following : 

\^{g{Ri{xA)) -9(Zt))\. 

Notice that the function g(-) is not Lipschitz and therefore the Wasserstein bounds obtained earlier cannot 
be used directly. However we use the fact that the Normal distribution has a density which is bounded above. 
So by Lemma 5.3, (5.14) and (5.16), 

KS(R t (x,0, Z t ) < 2My/Wass(R i (x,Z),Z t ) < m(^M + ^_ £ |£|3 + |1 - Kj 



N 



Since g is positive on (-co, 0], for a real valued continuous random variable X, 



E(g(X)) 



g'(t) ( P(X >t))dt- g(0)P(X > 0) . 



Hence, 



\E&g(Ri(x,t)-Eg(Zi)\< / ^(tjfp^foO > t) - P(Z £ > t))dt +g(p)\F(R i (x,t)>0)-V(Z t >0)\ 



<KS(Ri(x,0,Z e )( / g'(t)dt + g(0))<MKS(R i (x,O,Z e ). 

J — oo 

Hence putting the above calculations together and noticing that E(e Zi lz,,<o) = (3/2, we have just shown 
that 



E^^l^^xo) 



< M, 
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N 



Notice that 



Mi)| < \e" 2 ^EUe R ^h R ^ )<0 )-l3/2\ 



< - 1| ^(e^^l^^xo)! + |Eg( e ^«>l^ )<0 ) - /3/2| 

< Af|e^ + ^(e^l*.^,,) - /3/2| 

where the last bound follows from (5.20), proving the claimed error bound for coa(i). □ 

For deriving the error bounds on W3, we cannot directly apply the Wasserstein bounds obtained in equation 
(5.14), because the function y h-> e v l y< :o is not Lipschitz on R. However using (5.16), the KS distance between 
Ri(x,£) and Zi is bounded by the square root of the Wasserstein distance. Thus using the fact that e y l y< o 
is bounded and positive, we bound the expectation in Lemma 5.10 by the KS distance. 

Combining all the above estimates, we see that 

NE%[x\ - a*] = -£ 2 f3 (P N x + CW(P N x))i + rf (5.27) 

with 

kf| < \u (i)\ + MXi(y/N \u)i(i)\ + |w 2 (i)| + \d\ |o; 3 (z)Q . (5.28) 

The following lemma gives the control over r N and completes the proof of Equation (2.14), Proposition 2.1. 
Lemma 5.11. For s < k — 1/2, 

JV 

N 

JV-J-cx) " "° JV->oo 



hm E^\\r N f s = lim E" N Vi 2s |rf| 2 =0. 

Af — in*n 1 M — inn ' * 



1=1 

,JV| 



Proof. By (5.28), we have |rf | < |wo(*)l + MXi(VN |wi(t)| + |C<| M»)| + 1 1 |w 3 (i)|). Therefore 

iV JV , v 

£ l r "| 2 ^ ME "" E ( l2S l^( 1 )! 2 + ^ A2 (^^i(0 2 + C* 2 ^) 2 + C^3« 2 ) ) ■ (5.29) 
i=i t=i ^ ' 

Now we will evaluate each sum of the right hand side of the above equation and show that they converge to 
zero. 

• Since « 2s < 00, 

JV JV 00 

£e-V M)\ 2 < m± J2 * 2s A ? z M l E x ^ 2s °- ( 5 - 3 °) 

2=1 i=l i—1 

• By Lemma 5.6 and Lemma 5.2, 

JV JV 



NE^ >H ^ M*)| 2 < " X A J ^ IO| 4 e^ llC " 2 0. (5.31) 

i=l i=l 

• From Lemma 5.7 and Cauchy-Schwartz, we obtain 

N 1 , ,„ JV . ,„ 

^ Wl " V L °(l + | J R( a; ,0|v / lV) 2 ^ ^ ' 



i=l ^ 1 |- , "V*>S^I v ^ v i=1 
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1 /2 

Proceeding similarly as done in Lemma 5.2 it follows that J2iLi 6^ e^r^ Xf i As (|Cj| 8 + 1)) i s 



v 



bounded in N. Since, with x ~ ttq, R(x,£) converges weakly to Za as N — > oo, by the bounded 



convergence theorem we obtain lim 



JV->oo ■ 



lim W 1 



N- 



J 0(l+|rt(^£)j7JVp 
1 



'(l + \R(x,0\VN) 2 



and thus by Lemma 3.5 
= 0. 



Therefore we deduce that 



N 



lim |C:| 2 * 2s A^ 2 W| 2 = 0. 

iv— »-oo * — ' 



(5.32) 



• After some algebra we obtain from Lemma 5.10 that, 



N 



N 



e* n y,^ 2s i^i 2 m*)i 2 < m^j2 e X ^ 2s iCii 6 ^ 



M-^E^^A?^Cf(l + lal) 

* i— 1 



JV 



(=1 



i=l 



+ M 



ICII 



jV 



JV 



Similar to the previous calculations, using Lemma 5.2, it is quite straightforward to verify that each of 
the four terms above converges to 0. Thus we obtain, 



JV 



hm A^|C:| 2 M*)| 2 =0. 



JV-S- 



(5.33) 



Now the proof of Lemma 5.11 follows from (5.29), (5. 30), (5.31), (5.32) and (5.33) . 
This completes the proof of Proposition 2.1, Equation (2.14). 

5.3. Rigorous Estimates for the Diffusion Coefficient: Proof of Proposition 2.1, Equation 

(2.15) 



□ 



Recall that for 1 < i, j < N, 



NE 



( x i x i ) ( x j x j ) 



= -a- e£ 



(C^MC 1 / 2 ^- flAex P Q(x,e) 



The following lemma quantifies the approximations made in (2.48) and (2.49). 
Lemma 5.12. 

Eci [(C 1/2 £UC 1/2 0j (l A e X pQ(z,£))] = A,A/>, ; E^ [(l A ex P R tJ (x,0 
E c « (lAcxpi?y(x,0) =0 + Pij 
where the error terms satisfy 

1^1 < A/A.A^l + |0| 2 + l0| 2 ) 1/2 ^= 

V JV 

m < m(^=(i + \q\ + iod + E Ksi 3 + ii - 11 



N 



(5.34) 
(5.35) 
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Proof. We first derive the bound for 9. Indeed, 

\0ij\ < E«[|(C 1/2 C) l (C 1/2 e) J ((lAe Q (^«))-(lAe if - (a; ^) 



< MXiXj E^ 



6& ((lAe QM ) - (lAe^ W) )) 



1/2 



By the Cauchy Schwartz inequality 

\9ij\ < MXiXj (e«|(1 A e Q (*'Q) - (1 A e ^'(^0)|) 

< MX t Xj (ml\Q(x,0-R ij (x,0\ i 

Using the estimate obtained in Equation (5.2), 

<MKX^ + U 2 + U 2 ) 1/2 ^= 

v jv 



1/2 



verifying (5.34). 

Now we turn to verifying the error bound in (5.35). We need to bound: 

E§G7(ie«(a:,0)-s(2z)). 
where g(y) = 1 A e 4 *. Notice that M(g(Zt)) = (3. Since <?(•) is Lipschitz, 

\El(g(R l3 (x,0) - 9(Zi))\ < MVfaaaiRijfa&Zt) 
A simple calculation will yield that 

Wa8a{Rij(x,Z),R(x,{)) < M(\b\ + \Q\ + 1)-J= 

Therefore by the triangle inequality and Lemma 5.3, 



(5.36) 



N 



N 



Wass(%(x,0,^) < + ICil + 101) + j^jz E ICr-l 3 + |1 



TV 

Hence the estimate in (5.34) follows from the observation made in Equation (5.36) and we are done. □ 
Putting together all the estimates produces 



NE 



{x\ - x^x) - x°A = 2£ 2 /3X i X j 6 ij + M and |M | < M(|% | + A, A A, \ Pij \) 



(5.37) 



Finally we estimate the error of E^: 
Lemma 5.13. We have 



N 



JV->oo 

for any pair of indices 

Proof. From (5.37) we obtain that 

N 



lim VE' \{<f H) E N <l> j ) a \ = 0, Um E* |(^,£%) s h0 



N 



N 



2—1 i— 1 i— 1 

AT N N 

5^E^i 2 '|fl«| < ME E7ro |f»K 2s < M^E I »A?i Js (l| |Ci| 2 ) 1/2 ^= 

i=l i=l i=l 



(5.38) 
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N 1 

<M53E*»A?i 2 '(l + |Ci|)-^=->0 (5.39) 



i=l 

due to the fact that X)i=i ^i* 2s < 00 an d Lemma 5.2. Now the second term of (5.38): 

N N N 

i=l i=l * s=l 



TV 



The first term above goes to zero by (5.39) and the last term converges to zero by the same arguments 
used in Lemma 5.2. As mentioned in the proof of the estimate for the term W3 in Lemma 5.11, the sum 
^ mI/2 Y^ s =i I Cs 1 3 goes to zero. Therefore we have shown that: 



W7 

N 

im 



lim yW N \{^ i ,E N <f> i ) s \=0 



i=i 



proving the first claim. Finally from (5.34) it immediately follows that 

E*\{<f> i ,E N <f> :j ) s \ < E n i s f -> 

proving the second claim as well and we are done. □ 
Therefore we have shown 

AE [(xl - x<l){x} - a$)] - 2£ 2 /3(^, C0A + E N 



N 

lim VF"|(^,^)|=0 

TV— >oo * — * 



i=l 



This finishes the proof of Proposition 2.1, Equation (2.15). 
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